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[57] ABSTRACT 

The present invention relates to a process and system for 
real-time monitoring of a data processing system for admin- 
istration and maintenance support of the data processing 
system in the operating phase, which data processing system 
communicates in a client/server mode through intercon- 
nected networks (W), each client (WCL) comprising a 
browser (BRO) which supports a high-level hypertext lan- 
guage. Intelligent agents are installed in each server (WSE) 
for running, after the phrasing of client requests, a check on 
the status of each server, measuring and storing parameter 
information indicating the status and the behavior of the 
server at a given moment, which parameter information is 
automatically collected as a function of domains examined 
and systematically processed by the server so as to be 
offered in the form of presentation reports contained in 
dynamically evolving pages while the client's browser 
accesses the dynamic pages having the collected and pro- 
cessed information responding to a request. 

15 Claims, 1 Drawing Sheet 
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PROCESS AND SYSTEM FOR REAL-TIME 
MONITORING OF A DATA PROCESSING 
SYSTEM FOR ITS ADMINISTRATION AND 
MAINTENANCE SUPPORT IN THE 
OPERATING PHASE 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to a process and system for 
real-time monitoring of a data processing system for its 
administration and maintenance support in the operating 
phase, which data processing system communicates in the 
client/server mode through interconnected networks, each 
client comprising a browser which supports a high-level 
hypertext language. 

2. Related Art 

Generally, a distributed management environment makes 
it possible to integrate the administration of systems, net- 
works and user applications, the dialogue between the 
various machines of the system and/or between the various 
users being organized around requests and responses to 
these requests, the most common requests in a network 
being related to access to files or access to data. An appli- 
cation is said to be designed according to a "client/server" 
architecture when it is composed of two independent pro- 
grams which cooperate with one another to carry out the 
same operation, each of which runs in its own environment 
(machine, operating system), while a programming interface 
using a language constituted by commands makes it possible 
to control their dialogue. The client/server mode has the 
advantage of allowing a user (for example of a simple 
microcomputer) called a client to consign part of his task or 
some of his operations to be executed to a server. In this way, 
the client has a greater computing capacity at his disposal 
than that of his microcomputer. Likewise, a client can 
address a specialized server and effectively outsource an 
operation, the server being under optimum conditions in 
terms of implementation and expertise by virtue of its 
specialization. In this context, up to the present time, pro- 
viding real-time monitoring of a data processing system for 
its administration and maintenance support in the operating 
phase has involved the development of a specific application 
for each client, which represents a considerable drawback 
since, first of all, a technological choice of this type is very 
costly and prohibits simple upgrading since a modification, 
an addition or a new development inevitably requires a 
modification, an addition or a new development for each 
specific application. 

Faced with this technical problem without any effective 
solution, a second fundamentally different approach, may be 
envisaged: developing a generic client application and only 
upgrading the server. Once this technical problem was 
presented differently, a solution was created by observing 
systems operating in interconnected networks and by apply- 
ing this technique analogously to the administrative appli- 
cations. In effect, the dialogue of all "client/server" entities 
can be established through one or more networks which can 
be interconnected (Internet, for example), in which case 
TCP/IP (Transmission Control Protocol/Internet Protocol) is 
the most commonly used protocol. These networks consti- 
tute a veritable world-wide "web" (as it is commonly 
referred to by one skilled in the art), making it possible to 
connect multimedia servers to one another and forming the 
equivalent of an immense hypertext multimedia document 
which is described using high-level hypertext languages 
such as, for example, the language HTML (HyperText 
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Markup Language), the communications between the users 
(clients) and the servers being provided by the protocol 
HTTP (HyperText Transfer Protocol). A hypertext language 
like HTML defines the logical organization of the 

5 information, particularly with hypertext links (between 
texts, images, sounds, video sequences) for the development 
of content at the level of the server but not its formatting 
(that is, its organization into pages), which is handled by the 
client's software. A client in this context is equipped with a 

10 navigator (called a "browser" by one skilled in the art) used 
for browsing and scanning the information organized into 
pages offered by the various servers. However, these pages 
constructed by the servers are static, which also presents a 
substantial drawback when it is desirable to provide real- 

15 time monitoring of an administrative system in the operating 
phase. In effect, for efficient utilization, the evolution of the 
system over time (states of the machines, malfunctions, etc.) 
must be accessible and quickly known, and the pages 
constructed must not be presented in static form, but in a 

20 dynamically evolving form. Moreover, another drawback is 
apparent from the simple fact that the operation of a machine 
requires a minimum intervention and knowledge of its 
environment, and if a problem arises, it is necessary to 
establish a diagnosis, thus demonstrating a certain technical 

25 expertise, in order to rapidly discover the existence and then 
the source of the problem and to make a correction or 
possibly repair this machine, which is not necessarily the 
prerogative of the average user. 

SUMMARY OF THE INVENTION 

30 

The object of the present invention is to eliminate the 
various drawbacks of the solutions of the prior art and to 
offer a process that is simple, effective, and inexpensive to 
implement and that allows the real-time monitoring of a data 

35 processing system for its administration and maintenance 
support in the operating phase by constructing and present- 
ing the necessary information in the form of dynamically 
evolving pages. 

For this purpose, the monitoring process mentioned in the 

40 preamble is noteworthy in that, in order to carry out this 
monitoring, intelligent agents are installed in each server for 
running, after the phrasing of client requests, a check on the 
status of each server, measuring and storing parameters 
indicating the status and the behavior of the server at a given 

45 moment, which information is automatically collected as a 
function of the domains examined and systematically pro- 
cessed by the server so as to be offered in the form of 
presentation reports contained in dynamically evolving 
pages constructed in this way, while the client's browser 

50 accesses these dynamic pages having the collected and 
processed information responding to his request. 

Thus, according to the concept of the invention, and 
contrary to all expectations, an effective, fast, easy and 
inexpensive solution is offered thanks to the use of intelli- 

55 gent agents in the servers for constructing dynamic pages 
that can be browsed and read by the browsers of the clients, 
which clients need no development in order to browse and 
read the dynamic information received in response to their 
requests. Thus, these pages can dynamically adapt to the 

60 various requests, which also means that a plurality of clients 
asking the same question at a given moment do not neces- 
sarily receive the same response. This technological option 
makes it possible to reduce costs significantly, since any 
development for upgrading or expanding the initial system 

65 need only be implemented in the server, and no specific 
software needs to be provided or used by the client, who 
need only be equipped with a commercial browser, for 
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example of the Netscape Navigator type (trademark regis- computer WCL connected to the network W, for example a 

tered by Netscape Communications Corporation) which is a simple PC comprising a browser BRO, preferably of the 

low-cost, universally known and used product with which, type which supports a high-level hypertext language HTML, 

moreover, most microcomputers are currently equipped. such as the browser known as the NETSCAPE NAVIGA- 

In a remarkable way, the systematic processing of the $ T0R " Ail thc Presentation work is carried out on the client 

information collected for the construction of the dynamic end mocrocomputcr WCL using the browser BRO. The 

pages by the server is carried out in successive steps, each ^ fer of tnc * ata bet Y CCn ? 6 ? llCnt WCL and scrvcr 

of which steps corresponds to the processing of this infer- WS ^ * carried out . throu g h the network W using the 

mation by a particular module, the specific main modules f tandard C0 ™™™f™ pr °i° co 1 1 H 3 r 0VCr J the mulU " 

being the following: 10 la y er P rotoco1 TCP/IP. When the client WCL sends a request 

, . ... . because it wishes to display a new page on its screen, 

a page construction module which receives the client s accQrdin tQ ^ ( Qce fec of ^ 

request from the network and prepares the pages by faw data are b th F e m ^ SE M 

collecting the necessary data in the appropriate levels ^ constmcts ^tic data m6 adds mformation related 

depending on its operating system then by presenting tQ me alafms 0f Mems encounlered7 ^ ^ the ^ 

this data to a generic tool module for aiding in the in HTML language which is transmitted through the network 

construction, independent of operating system, {Q ^ ^ {n respon&e to ^ request so ^ ^ be displayed 

a semantic knowledge module of the operating system after the text received in HTML language has been inter- 

which fetches the low-level data in an information preted. 

construction module depending on the operating 2Q The systematic processing of the information collected for 

system, then constructs a high-level semantic represen- the construction of dynamic pages by the server is carried 

tation of the data requested. out m succe ssive steps, each of which corresponds to the 

an information construction module of the operating processing of this information by a particular module. As 

system which uses commands depending on the oper- indicated above, for the application of the monitoring 

ating system to acquire low-level data and provides an 2 s process, the construction of a page is carried out on the 

application interface for communicating with the server end WSE and is obtained by processing in various 

semantic knowledge module of the operating system, modules disposed at a plurality of levels according to their 

a generic tool module for aiding in the construction of the dependencies on the operating system OS. Two main levels 

pages in high-level hypertext language. are to be observed, the first of which LI relates to the 

Advantageously, for the application of the monitoring 30 construction of a page with the semantic knowledge of the 

process according to the invention, the construction of a operating system but without a direct interface with this 

page is implemented on the server end and is obtained by operating system, and the second of which L2 relates to the 

processing in various modules disposed at a plurality of implementation of physical accesses to the information of 

levels according to their dependencies on the operating the operating system OS. The specific main modules are 

system. Two main levels are to be observed, the first of 35 described below in a more detailed way. 

which relates to the construction of a page with the semantic The page construction module PM, when it receives the 

knowledge of the operating system but without a direct request from the client sent through the network W, prepares 

interface with this operating system, while the second relates the pages by retrieving the necessary information in the 

to the implementation of physical accesses to the informa- appropriate levels depending on its operating system OS, 

tion of the operating system. 40 then presents this information to a generic tool module GT 

for aiding in the construction, which is equally independent 

BRIEF DESCRIPTION OF THE DRAWING 0 f the operating system OS. With the page construction 

The following description in reference to the appended module P1 * i( is ble to °°° st ? ct a f a P hic K for re P r f 

drawing, given entirely as a non-limiting example, will sentlD 8 a data llkewise a Photograph can be used to 

make it clearly understood how the invention may be 45 P rcscnt a C ° DW *' or a hardwan ; ^P^t. Each 

implemented page is produced by grouping several modules. Eacb module 

L, ,„'„,„ ,. , . has a particular semantic from a viewpoint in a format 

The sole FIGURE represents, in highly schematic fashion, chosen &om , se( of iMe events For e le> , he 

an exemplary exchange of information (requests and correlation betW6en lhe physical vo i ume s, the volume 

responses to requests) between a client and a server com- 50 the volumes and ^ file tems can be 

mumcatmg through interconnected networks, using the pro- presented &om a volume ^ vi6wp oint in a chart format 

cess according to the invention. or from a mc system viewpoin , (a u t h e physical volumes 

DESCRIPTION OF THE PREFERRED [ e , 1 , ated , 10 a 8 ive , n fl]e . s ^ m ) to * ™ s 

EMBODIMENT characteristic makes it possible to facilitate the modi- 

H H 55 fication (semantic and format) of the way in which a 

Within the scope of the application of this process, the semantic is presented. Thus, it is also possible to use 

general architecture (machines and networks) is based on the programs in "Java" language (the data are presented 

architecture of the interconnected networks ("web"). This dynamically on the client end). Advantageously, an aid can 

means that when a server is connected to a local area be associated with the construction of each page. The latter 

network or to a wide area network(respectively a LAN or 60 must contain all the information and explanations the user 

WAN to one skilled in the art), this process can be applied needs to understand, on the one hand, the significance of the 

and thus rendered active from any machine connected to the information displayed and on the other hand, how to navi- 

network without having to install specific software or tools gate within the labyrinth that is a "hyperscript'' in HTML 

or even management agents (called "proxy agents" by one language. All the texts displayed on the client's screen can 

skilled in the art). In this way, a server WSE connected to the 65 be internationalized, that is, they can be read in the language 

network W and used at any point in the world can be of the server queried or in that of the client who sent the 

monitored from any other point in the world, from a micro- request. 
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The generic tool module offers a set of tools to aid in the memory. Thus, the page cache memory, which stores com- 

construction of pages in HTML language in order to con- plete pages such as those related to a system configuration, 

struct two- or three-dimensional graphical representations. is associated with the first part. The logic structure cache 

The tools so offered are not semantically dependent on the memory, which stores structures such as those related to a 

operating system OS. The following are examples of tools s file system, logic volumes, volume groups and physical 

provided: volumes is associated with the second part. The basic 

The type sequence program or "scripting" tool (in the structures cache memory, which stores structures such as the 

language of one skilled in the art) is a portable tool used to activity or characteristics of a particular physical volume, for 

write the modules necessary to the application of the present example a disk is associated with the third part, 

process, which tool has the following advantages: 10 The information construction module OS-IB of the oper- 

easy and independent access to the basic commands of the atim J system, which is dependent on the operating system 

operating system OS which make it possible to manipu- 0S ' ^ commands depending on the operating system to 

late the files and execute the commands of this oper- act l uire low-level data and offers an application interface 

ating system API *° r commumcat ing w i tn tne semantic knowledge mod- 

an efficient "debugging" and "trace" service, 15 ule °f: SK of , tbe °P e c ra *"S s yf em; U *f»™'** to the 

i c _r operating system OS through an application interface 

a satisfactory level of performance, According to ^ present procesSj it fa equipped to 

true portability to different operating systems, preseQl information relative to the system configuration, the 

a syntax which allows easy maintenance and upgrading, system utilization, the alarms and the reliability of the 

a language which makes it possible to structure and 20 system, which information corresponds either to instanta- 

represent complex data, neous values, historical values, or tendencies. Some values 

the ability to expand the language by interfacing with the relative to the characteristics of the operating system can be 

C language. accessed by directly calling commands of the operating 

A set of graphical tools independent of the operating system (for example, in order to obtain complete software 
system such that two-dimensional and three-dimensional 25 and hardware configurations in a given operating system 
graphics are produced simply and rapidly, photographs are such as AIX, Windows NT, etc) whereas others are pro- 
incorporated into HTML pages, and evolving graphics are duced by specific daemons. In order to offer the module 
produced as a function of phenomena which themselves OS-SK a uniform view of all the necessary information, the 
evolve slowly. module OS -IB uses an application interface API which 

Tools which make it possible to take full advantage of the 30 makes it possible to hide the internal complexity. In fact, the 

HTML language. module OS -IB is a module that is completely independent of 

A tool to aid in the construction, which offers a generic the higher level. In some cases, the acquisition of certain 

application interface which is used by each page. immediate values cannot be carried out because one or more 

The semantic knowledge module OS-SK, which is depen- successive selective analyses ("snapshots") are necessary in 
dent on the operating system, collects the low-level data in 35 order to obtain a result or to give representative information, 
an information construction module OS-IB dependant on the The immediate values make it possible to know and to 
operating system present at the second level L2, then con- rapidly display, for example, the instantaneous state of a 
structs a high-level semantic representation (logical machine, an instantaneous software or hardware 
structure) of the data requested. For example, the correlation configuration, the processors consuming the most power at 
between the physical volumes specific to an operating 40 a given moment, the instantaneous activity of the "cpu", the 
system, such as a system under UNIX (registered trademark instantaneous status of the alarms, etc. The immediate 
in the US and other countries, exclusively licensed through values also make it possible to establish effective diagnoses 
the X/OPEN Company Ltd.), the volume groups, the logical relative to fundamental components of a machine. The 
volumes and the file systems can be presented beginning so-called historic values are saved in order to be used either 
with the acquisition of the basic data, then grouping this data 45 to display changes of states, the evolution of values with a 
by volume group from a logical viewpoint. In greater detail, graphical presentation, etc., or to calculate tendencies or 
the module OS-SK comprises three parts. A first part, make predictions. One of the intelligent agents used in the 
directly related to the module PM, contains the code that server and in particular integrated into the module OS-IB is 
makes it possible to collect semantic information and the agent RSF (Remote Services Facilities), which is an 
present it. A second part, which is the manager of the logical 50 event management agent designed to send real-time notifi- 
structures containing the code that makes it possible to cations in a given mode. The utilization of the agent RSF 
construct semantic structures by grouping individual basic makes it possible to provide a structure for the alarm buffer 
structures which for example present the relationship registers as well as a history of the alarms, which consid- 
between adapters of local area networks, interfaces and erably facilitates the search for a particular alarm. Moreover, 
daemons. Finally, a third part, which is the manager of the 55 associated with each immediate or instantaneous value is a 
basic structures containing the code that makes it possible to threshold which can easily be modified by the user accord- 
construct basic structures, for example the information ing to his needs, a threshold which, when exceeded, triggers 
related to each local area network adaptor. Moreover, in an event. This event is then stored in a register provided for 
order to minimize the time needed to respond to a request or recording the history of events with a date, an indicator 
to construct a page in HTML language, or when several 60 name related to the event, a threshold value, etc., and the last 
users successively initiate a request for the same page, a event recorded can be used to determine whether an instan- 
plurality of cache memories are advantageously used. These taneous value is respectively above or below a threshold, 
cache memories are provided for saving complex acquisition and then to display on the client's screen the correct color 
information rather than dynamic information. They are (respectively red or green) of the corresponding icon. The 
structured as a function of their level of utilization (page, 65 agent RSF is warned of new events and activates the 
logical structure, basic information). A duration and/or a triggering of an alarm when a threshold under observation is 
condition is associated with each block stored in a cache exceeded, with the possibility of deferring the triggering of 
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the alarm until, for example, the threshold is observed to 
have been exceeded two or three times. Of course, the 
triggering of an alarm by the agent RSF is actually only 
implemented when no alarm for this threshold has already 
been triggered, or when a previous alarm has been deleted. 
Thus, a threshold may be considered to trigger an event each 
time a value exceeds or falls below this threshold or when 
a value has exceeded or fallen below this threshold two or 
three times. The agent RSF frequently scans the error log file 
"errlog file" and thus efficiently detects a new alarm. When 
an alarm is detected, the description of this alarm is stored 
in a file associated with the software or hardware element 
which caused this alarm. The agent RSF saves a history of 
all the alarms. The alarms are suppressed either after the 
intervention of the user, or upon the expiration of a prede- 
termined time period when the user has initiated a request 
specific to a page which has this alarm. 

For better understanding, the programming interface API 
between the module OS-IB and the module OS-SK can be 
divided into two parts which offer an efficient view of all the 
information acquired by the module OS -IB. The first part 
essentially relates to files containing information which is 
mainly fixed. The second part includes several C language 
libraries which provide an interface with the buffer registers 
containing the immediate values as well as with the files 
containing the various histories. This second part is generic, 
which means that it is not necessary to add a new function 
to the interface API when a new class with new attributes is 
added. All the classes and all the attributes are seen as unique 
identifiers. A new entry for the new class is added into an 
abstract description of all the classes. An activity buffer 
register is connected to this second part, which register is 
used like a programming interface for data stored in a shared 
memory and is filled by a second intelligent agent ASRX 
(Automatic Site Reporter for Unix) with information read in 
the operating system. A buffer register containing the history 
of the activities is constructed with immediate values 
obtained in the activity buffer register using the second part 
of the programming interface API. A threshold manager 
saves the prior values of a buffer register so as to be capable 
of determining if a value has exceeded a threshold. This 40 
manager directly calls the agent RSF to inform it that a new 
event has occurred and that it must analyze whether or not 
an alarm must be produced. The code of this manager can be 
generic, since all the data inspected is of the representative 
information type; for example, a threshold can be placed on 
the size of each file system. Likewise, a tendency manager 
saves the prior values of a buffer register so as to be capable 
of warning that a limit is about to be reached, for example 
the number of time units in which the maximum size of a file 
system would be reached. In this context, the agent RSF 
interrogates the file "errlog" as well as various specific files 
and fills the alarm buffer register, the configuration history 
and alarm history files and finally calls the audible warning 
signals ("beepers"), when they exist. In fact, during the 
monitoring of the log files of the system, a problem is 
detected by the agent RSF when an error occurs, since 
specific messages destined for these log files are written by 
an application such as "errlog." When a problem is detected, 
if a threshold or a set of thresholds included in such a 
message is also exceeded within a determined time period, 
a reaction is triggered. The information relative to the 
instantaneous configuration specific to the detection of and 
the reaction to the problem found by the agent RSF is stored 
in files of the agent RSF which contain message sources, 
message "templates" and the messages saved. 

Each message source that the agent RSF monitors and 
transmits has the following attributes: 
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a unique identifier of a character string used by the agent 
RSF, 

a path to the log file of the source monitored, 

a path for executing an action when the passing of a 

threshold is detected, 
a time interval at the end of which a piece of detected 

information is deleted, 
a time interval during which the agent RSF scans for the 

arrival of new messages from a source. 
In this case, the commands which manipulate the message 
source command file can, for example, be the following: 
mkmsrc: this command alerts the agent RSF to monitor a 

source which has the attributes described above, 
Ismsrc: this command triggers the printing of various 

attributes of the sources contained in the command file, 
chmsrc: this command makes it possible to modify the 

attributes of an existing source that is being monitored, 
rmmsrc: the source with the identifier specified no longer 

needs to be monitored for the reception of messages. 
A list of messages is saved for each source monitored by 
the agent RSF, and each message has the following 
attributes: 

a unique identifier used by the agent RSF to detect a 

determined message in the source, 
a value of an integer which defines how many messages 

comprising a threshold can be detected before an action 

is executed, 

a period of time during which a detected message must be 
considered valid; if a message is detected and it 
remains below a determined threshold, it will be 
deleted after this period of time, 

the maximum number of messages which can be stored in 
a register after detection. 

The commands which manipulate the files containing the 
message "templates" can, for example, be the following: 

lsmsg: this command makes it possible to list the mes- 
sages in each source monitored by the agent RSF, 

mkmsg: this command makes it possible to alert the agent 
RSF to search the source given for a specified message 
having the attributes described above, 

chmsg: this command makes it possible to change the 
attributes of a message which is sought by the agent 
RSF, 

rmmsg: this command makes it possible to alert the agent 
RSF so that it no longer searches for the determined 
message associated with a monitored source. 

Moreover, a daemon of the agent RSF will periodically 
scan each of the sources monitored, searching for new 
occurrences of the messages monitored. When one of them 
is found, the count relative to the number of occurrences of 
this message is incremented, and if this count exceeds the 
determined threshold value during the determined time 
period, the specific action is executed. This daemon will also 
periodically clean up the files when necessary. 

A second intelligent agent, the agent ASRX, is also 
present in the module OS-IB, and its principal object is the 
measurement and management of a machine; it makes it 
possible to automatically collect information which is then 
automatically produced in the form of presentation reports 
which are also automatically updated. The agent ASRX is 
advantageously used to produce the pages relative to the 
utilization of the machines monitored; in particular, it makes 
it possible to collect the raw data relative to the utilization 
of the machines, to calculate the usable data in order to 
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supply the immediate data calculated and to calculate the 
passing of thresholds, as well as to produce the history (day, 
week, month, from a given moment) of the data calculated. 
Any modification in the system must be supplied, on the one 
hand, to the data collector of the agent ASRX, so it is 
necessary to add to it the new attributes in the classes, the 
new classes, the file system, etc., and on the other hand to the 
tool for creating presentation reports of the data, which is 
inside the agent ASRX, since the new attributes, the new 
classes and the new calculation modes are necessary for this 
tool to create its reports. The agent ASRX is in fact consti- 
tuted by three main parts, which are the data collector, the 
data manager, and the data reporter. 

The data collector runs on each machine monitored. The 
data is collected in patterns that make it possible to deter- 
mine which objects must be monitored in order to collect the 
basic information necessary to obtain a steady, reliable 
process. Data collection is based on the design of a basic 
structure which facilitates the addition of new objects to be 
monitored. The main characteristics of the data collector be 
summarized as follows: a reasonable memory size, a mini- 
mized cpu consumption so as not to affect the performance 
of the machine, an execution which does not cause any 
failure of the machine and which maintains its integrity, 
simple porting to various types of operating systems and 25 
simple testing, easy installation in various types of tools, 
simple management and implementation of the collection 
without any loss of data during a determined period. 
The chief functions of the data collector are the following: 
the data is collected in two different ways: 

at the outset, certain attributes defined o as to be 
constant until the next reinitialization of the machine 
are collected once, for example the version of the 
system, and must not be dynamically changed, 
at a set date, certain attributes can be collected accord- 
ing to the year, the month, the week, the day and the 
time (hour, minute). 
Specified dates and times for the collection of data must 
be configured for each object. A method calling mechanism 
makes it possible to add new objects to be monitored 
(system call functions, system commands). The objects to be 
monitored are classified using a given object model and they 
satisfy the following criteria: 

allowing the calculation of the mean time to failure 
(MTTF), the mean up time (MUT) or mean down time 
(MDT) of a hardware or software component, 
recording any error or event which occurs such as: 
a system failure and its cause (support, power loss, 

etc), the reinitialization of the system, 
"hardware" errors, 

"software" errors (kernel, transmission, applications), 
changes in system configuration, 
to record the main activities of the components: 
statistics relative to the system load, 
cpu time, 

disk activity (number of inputs/outputs), 
number of operations, 
transmission activities, 

to record the initial "hardware and software" configura- 
tions for the data collector (list of software installed, list 
of cards installed). 

All this data is extracted from the object data manager. 
The standard configurations are provided to facilitate the 



the "trace." In the case in which the data collector is placed 
on interrupt, the system resource controller records the 
events in the error record file. 

The data manager itself makes it possible to rearrange the 
data collected from the various machines observed so as to 
facilitate access relative to predefined criteria (date, version, 
machine model, etc.). The chief characteristics of the data 
manager can also be summarized: a large storage capacity, 
good performance in terms of requests, ease in handling 
complex requests and ease of management. The data are 
calculated and saved in files. The main functions processed 
by the data manager are the following: 
creation of files or "data bases" (arrays, relations, indices, 
etc.), design of files or "data bases" for easy access, for 
adding classes of new objects to be monitored and for 
improving performance, 
the insertion of reliable, available and easy-to-use data 
into the files or "data bases"; all the new objects 
collected in the machines observed are placed in these 
files or "data bases" in which shared information is 
constructed (without scanning), 
the high-level requests are transmitted to these files or 
"data bases" in order to allow: the execution of pre- 
sentation reports, data analysis, the running of interac- 
tive applications such as browsing; programming inter- 
faces for the operations to be executed in these files or 
"data bases" ("get", "search", "filter", etc.) are pro- 
vided to facilitate the writing of applications. 
The organization of these files or "data bases" is designed 
to satisfy all the requests sent by the applications. A rela- 
tional base, for example Oracle (trademark of Oracle 
Corporation) can be used to manage data in combination 
with any tool required for the development of applications. 

The data reporter in turn makes it possible to generate 
standard presentation reports from files in which all the raw 
data collected is stored. The chief characteristics of the data 
reporter can be summarized as follows: ease of use, control 
of requests by means of graphical user interfaces; the 
generation of reports must be able to use existing tools. The 
results are classified so as to provide excellent availability 
and reliability of the system, high-performance system 
activities, and easy "hardware/software" configuration of the 
system. The various results can be effectively obtained and 
used following the definition of a general presentation report 
45 or the definition of a detailed presentation report, while 
interactive applications are provided such as, for example, a 
browser for reading certain fields of the files, as well as a 
graphical presentation on a screen and its printing on a 
printer. 

50 More precisely, the object of the general presentation 
report is to provide a synthetic, global view of the reliability 
and availability of the system from indicators such as the 
mean time to failure (MTTF), the mean time between 
failures (MTBF), the mean up time (MUT) or mean down 
55 time (MDT) of a hardware or software component. The 
marking of errors which occur frequently, the correction of 
the main errors for a given "software" version, the distri- 
bution of the errors and the utilization of components that 
are standard in terms of cpu time, input/output volume, 
network activity and resources used are also facilitated by 
the definition of the general presentation report. The activity 
of the components is relative to a wide variety of utiliza- 
tions; the measurements ("metrics")) offered are for example 
used to perform a count, to determine a load or a balancing 
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production of statistics and the analysis of availability. The 65 of work or samples to be examined, etc. Thus, the analysis 
system resource controller is advantageously used to man- of the work load as a function of the resources in terms of 
age the data collector (starts/stops, validation/invalidation of response time to a transaction provides a significant element 
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for efficient analysis of availability, as well as an indication 
of the resources responsible for the problems encountered. 
All of this requires the utilization of components for mea- 
suring and collecting the response times to a transaction 
(disk, memory, cpu times, networks, etc.). Generally, the 
system resources which can cause problems for an applica- 
tion are the processor, the memory, the disks, the chains, the 
interprocess communications, the networks, the functional 
units, etc. 

Likewise, the object of the detailed presentation report 
corresponds to a specific utilization of the files for a par- 
ticular object. Thus, the requirements must be clearly 
expressed in order to provide tools dedicated to the extrac- 
tion of data in these files. The precise form of a statistical 
analysis of data is specific to each utilization. Its design 
makes it possible to produce standard presentation reports 
on the availability and reliability of the machines and 
components, which corresponds to a global view of th e 
product monitored in terms of mean time to failure (MTTF), 
mean up time (MUP) or mean down time (MDT), etc., as 
indicators on the "hardware" or "software" components. A 
correlation can also be established between the activities of 
the components and the errors. Its design also makes it 
possible to present general information on the contents of 
the files: the date of observation (start, end, number of 
machines, model, version, etc.), a list of the machines 
observed by activity (development, configuration 
management, files, remote processing, etc.), the number of 
error types detected during the observation, a list of the 
models observed, a fist of the operating systems observed by 
version, and the number of machines observed for each 
model. As a result of to this design, services are also 
provided for carrying out the analysis to determine the cause 
of the error. An interactive application is necessary to 
implement this characteristic, and the data representation 
uses a graphical or textual presentation. 

The contents of the files also provide a synthetic view of 
all the information, which helps to choose the machines to 
be observed, and they include the number of existing ver- 
sions in the files, the start and the end of the observation 
period for a version of a system, as well as the number of 
machines which satisfy the criteria of the models and the 
versions. 

The view of a machine can be divided, causing various 
criteria to appear: a view relative to the position, a view 
relative to the model of the machine, a view relative to the 
errors, and a view relative to the "hardware" or "software" 
components. 

In this way, the process according to the invention pro- 
vides a general state of the server monitored, including the 
behavior of the machine in terms of abnormal events, 
failures or deteriorations in performance. In fact, it provides 
a graphical application which makes it possible to execute 
system management tasks by manipulating objects. This 
process is built on various concepts which take into account 
the operating system, the storage of the data, the network in 
question, the printing, and the applications. Each concept 
involves precisely identified objects, for example the con- 
cept related to the operating system groups the cpu, the 
processes, the daemons, the number of users connected, etc. 
For each object managed, the following elements are cov- 
ered: the object identification attributes, the object configu- 
ration attributes, the object utilization and the performance 
attributes, the attributes for access to the objects and the 
status information of the objects (active or inactive). 

The present process relates to the aspects of the software 
and hardware configurations, to the use of the system 
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resources in real time, as well as to the various tendencies 
and the handling of events. It makes it possible to use 
advantageously organized functionalities to carry out the 
following various operations: 

the configuration oriented operation, which provides a 

global view of the software and hardware configuration 

of a server, 

the network management oriented operation, which pro- 
vides a synthetic view relative to the availability of the 
resources in question, 

the utilization oriented operation, which makes it possible 
to monitor the capacity of the resources to meet current 
and future needs; it involves measuring the utilization 
of the main software and hardware resources and 
makes it possible to provide a visual check on the 
server, 

the change oriented operation, which makes it possible to 

save any record of a software or hardware change, 
the event oriented operation, which makes it possible to 
detect, produce presentation reports on, search for and 
correct the problems related to the message sending 
service; an analysis of the presentation reports relative 
to the problems makes it possible to prevent the reoc- 
currence of these problems, 
the error management oriented operation, which makes it 

possible to re-establish normal services for the user, 
the service oriented operation, which makes it possible to 
control the services used most often such as printing, 
electronic mail, file transfers, initialization of remote 
processing, etc. 
the security oriented operation, which makes it possible to 
provide the status of the control of access to local or 
distributed resources from the user point of view, for 
example the statistics on attempted intrusions, 
the application oriented operation, which makes it pos- 
sible to manage various applications (configuration, 
installation, client, etc.); it supplies details on the status, 
the utilization, and the consumption of resources rela- 
tive to these applications. 
In conclusion, according to the present monitoring 
process, an effective, fast, easy and inexpensive solution is 
offered thanks to the utilization of intelligent agents in the 
servers for constructing dynamic pages which can be 
browsed and read by the browsers of clients, which clients 
need no development in order to browse and read the 
dynamic information received in response to their requests. 
Thus, these pages can dynamically adapt to the various 
requests. This technological option makes it possible to 
reduce costs significantly, since any development for 
upgrading or expanding the initial system need only be 
implemented in the server, and no specific software needs to 
be provided or used by the client, who need only be 
equipped with a commercial browser, which is a low-cost, 
universally known and used product with which most micro- 
computers are currently equipped. As a result of this process, 
it is possible to operate on the main information which 
makes it possible to maintain proper functioning of the 
machines. A simple interface for the system resources and 
the services is supplied and serves to hide the complexity of 
the operating systems. This process is of particular interest 
to users who have no expertise in the field of systems 
management, and any user in charge of the operation of a 
machine is thus provided with a valuable aid, a large 
quantity of useful information which allows him to perform 
an effective intervention for corrective maintenance or 
recovery of resources. According to the present process, 
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which can easily be launched from a microcomputer con- 
nected to a local area network, a simple system configuration 
and a global view of the behavior of the machine are 
provided, while the same graphical interface can be used in 
any machine of the system. Portability to various operating 
systems is made easy. The users managing the various 
machines with the same graphical interface and the same 
concepts can, without training, quickly understand whether 
or not a server is operating normally, and establish efficient 
contact with support teams by providing significant infor- 
mation for a proper diagnosis. Such an approach makes it 
possible to considerably reduce down time due to the failure 
of a server and, in addition, allows powerful and effective 
use of the information supplied according to the present 
process by an expert system. 

While this invention has been described in conjunction 
with specific embodiments thereof, it is evident that many 
alternatives, modifications and variations will be apparent to 
those skilled in the art. Accordingly, the preferred embodi- 
ments of the invention as set forth herein, are intended to be 
illustrative, not Limiting. Various changes may be made 
without departing from the spirit and scope of the invention 
as set forth herein and defined in the claims. 

We claim: 

1. A process for real-time monitoring of a data processing 25 
system for administration and maintenance support of the 
system in an operating phase, said data processing system 
being arranged to communicate in a client -server mode 
through interconnected networks, each client including a 
browser which supports a high-level hypertext language, 30 
comprising the steps of: 

installing intelligent agents in each server for phrasing a 
client request via the intelligent agents in each server, 

running a check on the status of each server, measuring 
and storing parameter information indicating the status 
and behavior of the server at a given moment, 

automatically collecting the parameter information as a 
function of domains examined and systematically pro- 
cessed by the server, 

offering the information in the form of presentation 
reports contained in dynamically evolving pages, while 
accessing said dynamic pages having the collected and 
processed information responding to a request via the 
clients browser 

carrying out systematic processing of the parameter infor- 
mation collected for construction of the dynamic pages 
by the server in successive steps, each step correspond- 
ing to the processing of said information by a particular 
module, including a page construction module, a 
semantic knowledge module, an information construc- 
tion module and a generic tool module, 

receiving a client's request in the page construction 
module from the network and preparing the pages by 
collecting the necessary data in appropriate levels 
depending on the construction module's operating 
system, then, by presenting said data to the generic tool 
module for aiding in the construction, independent of 
the operating system, 

fetching low-level data in the information construction 
module depending on the operating system via the 
semantic knowledge module of the operating system, 
then 

constructing a high-level semantic representation of the 
data requested, 

acquiring low level data utilizing commands to the infor- 
mation construction module of the operating system to 
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acquire said low-level data and communicating with 
the semantic knowledge module of the operating sys- 
tem via an application interface, and 
constructing the pages in high-level hypertext language 
via the generic tool module for aiding in the construc- 
tion of the pages in high-level hypertext language. 

2. A monitoring process according to claim 1, character- 
ized in that for the application of said process, the semantic 
knowledge module of the operating system comprises three 
parts, a first part directly contained to the page construction 
module and containing a code that makes it possible to 
collect and present semantic information, a second part for 
managing of logical structures containing the code to con- 
struct semantic structures by grouping individual basic 
structures, and a third part for managing basic structures 
containing the code to construct basic structures. 

3. A monitoring process according to claim 2, character- 
ized in that for the application of said process, a first 
intelligent agent in the server, integrated into the information 
construction module of the operating system, is an event 
management agent adapted to send real-time notifications in 
a given mode, the utilization of which makes it possible to 
provide alarm buffer registers and a history of alarms so as 
to facilitate the search for a particular alarm, said alarm 
buffer registers having an alarm threshold associated with 
each instantaneous threshold value which, when exceeded, 
triggers an event. 

4. A monitoring process according to claim 3, wherein for 
the application of said process, a second intelligent agent in 
the server, integrated into the information construction mod- 
ule of the operating system, is an agent for performing the 
steps of: 

measuring and managing a machine, 

automatically collecting information which is then auto- 
matically produced in the form of presentation reports 
that are automatically updated, 

producing pages relating to the utilization of the machines 
being monitored, 

collecting raw data relative to the utilization of the 
machines, and 

calculating usable data to supply instantaneous data to 
calculate the passing of thresholds and to produce the 
history of the data calculated, 

said second agent being constituted by three main parts, 
a data collector which collects the data in patterns to 
determine objects to be monitored so as to collect the 
basic information necessary, a data manager to rear- 
range the data collected from the various machines 
observed so as to facilitate access relative to predefined 
criteria, and a data reporter to generate standard pre- 
sentation reports from files in which all the raw data 
collected are stored. 

5. A monitoring process according to claim 4, wherein 
organized functionalities are used to carry out the following 
various operations: 

a configuration oriented operation to provide a global 
view of the software and hardware configuration of a 
server, 

a network management oriented operation to provide a 
synthetic view relative to the availability of the 
resources in question, 

a utilization oriented operation to monitor the capacity of 
the resources to meet current and future needs, which 
involves measuring the utilization of the main software 
and hardware resources and providing a visual check 
on the server, 
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a change oriented operation to save any record of a 
software or hardware change, 

an event oriented operation to detect, search for and 
correct problems relative to message sending, and 
produce presentation reports on an analysis of the 
presentation reports relative to the problems for pre- 
venting reoccurrence of the problems, 

an error management oriented operation to re-establish 
normal services for the user, 

a service oriented operation to control most often used 
services, 

a security oriented operation to provide the status of the 
control of access to local or distributed resources from 
a user point of view, and 

an application oriented operation to manage various appli- 
cations by supplying details on the status, the 
utilization, and the consumption of resources relative to 
such applications. 



to the page construction module, which contains a code that 
makes it possible to collect semantic information and 
present said information, a second part which manages 
logical structures containing the code which makes it pos- 
5 sible to construct semantic structures by grouping individual 
basic structures and a third part which manages basic 
structures containing the code which makes it possible to 
construct basic structures. 

8. A monitoring system according to claim 7, further 
10 including alarm buffer registers and characterized in that the 
server, integrated into the information construction module 
of the operating system, includes a first intelligent agent 
functioning as an event management agent for sending 
real-time notifications in a given mode, the utilization of 
15 which provides a history of alarms so as to facilitate the 
search for a particular alarm buffer register, said alarm buffer 
registers having an alarm threshold associated with each 
instantaneous threshold value which, when exceeded, trig- 



gers an event, said alarm threshold being capable of rnodi- 

6. A system for real-time monitoring of a data processing 20 fication by a user, 

system for administration and maintenance support of the 9. A monitoring system according to claim 8, wherein the 

data processing system in an operating phase, said data server is integrated into the information construction module 

processing system adapted to communicate in a client-server of the operating system, said server including a second 

mode through interconnected networks, each client includ- intelligent agent for measuring and managing a machine, 

ing a browser which supports a high-level hypertext 25 and for automatically collecting information, said informa- 



30 



language, comprising; 

intelligent agents in each server for running a check on the 

status of each server after phrasing of client requests, 
means for measuring and storing information indicating 

the status and the behavior of the server at a given 

moment, 

means for automatically collecting said information as a 
function of domains examined and systematically pro- 
cessing the information by the server, 

means for offering the information in the form of presen- 
tation reports contained in dynamically evolving pages 
while the client's browser accesses said dynamically 
evolving pages having the information collected and 
processed in response to a request, 

wherein the systematic processing of the information 
collected for construction of the dynamic pages by the 
server is carried out in successive steps, each step 
corresponding to processing of said information by a 
particular module, including: 

a page construction module which receives a client's 
request from a network and prepares the pages by 
collecting necessary data in appropriate levels 
depending on the operating system of the page 
construction module, then presenting said data to a 
generic tool module for aiding in the construction of 
the pages, independent of the operating system, 

a semantic knowledge module of the operating system 
for fetching low-level data in an information con- 
struction module and men constructing a high-level 55 
semantic representation of the data fetched, 

an information construction module of the operating 
system which uses commands depending on the 
operating system to acquire low-level data and pro- 
vides an application interface for communicating 
with the semantic knowledge module of the operat- 
ing system, and 

a generic tool module for aiding in the construction of 
the pages in high-level hypertext language. 
7, A monitoring system according to claim 6, character- 65 
ized in that the semantic knowledge module of the operating 
system comprises three parts, a first part directly connected 



tion then being automatically produced in the form of 
automatically updated presentation reports, said second 
agent being used to produce pages relating to the utilization 
of the machines monitored, thereby making it possible to 
collect raw data relative to the utilization of the machines, to 
calculate the usable data in order to supply the instantaneous 
data calculated, to calculate the passing of thresholds and to 
produce the history of the data calculated, said second agent 
being constituted by three main parts, a data collector for 
35 collecting data in patterns to determine which objects must 
be monitored so as to collect the basic information 
necessary, a data manager for rearranging the data collected 
from various machines observed so as to facilitate access 
relative to predefined criteria, and a data reporter for gen- 
40 erating standard presentation reports from files in which all 
the raw data collected are stored. 

10. A monitoring system according to claim 9, wherein 
organized functionalities are used to carry out the following 
various operations: 

a configuration oriented operation which provides a glo- 
bal view of software and hardware configurations of a 
server, 

a network management oriented operation which provides 
a synthetic view relative to availability of resources in 
question, 

a utilization oriented operation for monitoring the capac- 
ity of the resources to meet current and future needs, 
which involves measuring the utilization of the main 
software and hardware resources and providing a visual 
check on the server, 
a change oriented operation for saving a record of any 

software or hardware change, 
an event oriented operation for detecting, and producing 
presentation reports on, searching for and correcting 
problems relative to message sending, analyzing of the 
presentation reports relative to system problems with a 
view toward preventing reoccurrence of the problems, 
an error management oriented operation for 

re-establishing normal services for the user, 
a service oriented operation for controlling the most often 
used services, 



45 



50 



60 



04/23/2004, EAST Version: 1.4.1 



6,021,437 

17 18 

a security oriented operation for providing status infor- appropriate levels depending on the construction module's 

mation relative to the control of access to local or operating system, means for presenting said data to the 

distributed resources from a user's point of view, and generic tool module independent of the operating system for 

an application oriented operation for managing various aiding in the construction of the dynamic pages, means for 
applications by supplying details on the status, the 5 fetching low-level data in the information construction mod- 
utilization, and the consumption of resources relative to ule depending on the operating system via the semantic 
such applications. knowledge module of the operating system, means for 

11 . A system for real-time monitoring of a data processing constructing a high-level semantic representation of the data 
system for administration and maintenance support of the requested, means for acquiring low level data utilizing 
system in an operating phase, and for communicating in a io commaads to the information construction module of the 
client-server mode through interconnected networks, each opcra a n g system to acquire said low-level data and com- 
client comprising a browser which supports a high-level municating ^ the semantic knowledge module of the 
hypertext language, comprising intelligent agents m each d via afl licatioQ ^ and meaQS for 
server for phrasing a client request via the mtelligen agents constructi the m hi h . kvcl h text { via 
m each server, means for running a check on the status of 15 4 . ■ * i j i r -j* • 1 . ■ * 
each server, means for measuring and storing parameter ,he S ene » c m ° d " le f ° r ,n the of ,he 
information indicating the status and behavior of the server P a ?f m h^pertexl anguage. 

at a given moment, means for automatically collecting the 13 s y stem to claim 11, characterized in that 

parameter information as a function of domains examined the semantic knowledge module of the operating system 

and systematically processed by the server, and means for 20 comprises three parts, a first part directly connected to the 

offering the information in the form of presentation reports P a 6 e construction module, said first part containing a code 

contained in dynamically evolving pages, while accessing f hat makes it possible to collect and present semantic 

said dynamic pages having the collected and processed information, a second part for managing of logical structures 

information responding to a request via the client's browser, containing the code to construct semantic structures by 

wherein 25 grouping individual basic structures, and a third part for 

said dynamically evolving pages are constructed by car- ma ™gin§ b ^ structures containing the code to construct 

rying out systematic processing of parameter informa- basic structures. 

tion by the server in successive steps, each step corre- , 14. A system according to claim 11, characterized in that 

sponding to the processing of said information by a f or „. the Wl^ation of said process there is included alarm 

particular module, including a page construction 30 buffer registers and a find intelligent event management 

module, a semantic knowledge module, an information a 8 eDt ln } h * ^ m,e 8 ra,ed mt0 the m J formatlon constnlc - 

construction module and a generic tool module, Uon m ° d ^ ° f tbe °P<™tmg system, said event management 

, , , . . , . agent being adapted to send real-tune notifications in a given 

the page construction module being configured to receive modc> for said ^ buffcf re ^ sters lQ idc a 

a clients request from the network and prepare the 35 histor y 0 f alarms so as to faciUtate the search for a particular 

pages by collecting the necessary data in appropriate ^ said ^ buffer fe ^ sters hayi afl alafm threshold 

levels depending on the construction module s operat- associated ^ each insta ntaneous threshold value which, 

mg system, and then present said data to the generic wfaen exceedcdf tri a[1 event< 

tool module for aiding in the construction, independent 15. A monitoring process according to claim 14, wherein 

ot the operating system, 4Q for lhe application of ^ proc ess, there is included a second 

the information construction module being configured to intelligent agent in the server for measuring and managing 

fetch low-level data depending on the operating system a machine, said second agent being integrated into the 

via the semantic knowledge module of the operating information construction module of the operating system, 

system, construct a high-level semantic representation and adapted to automatically collect information which is 

of the data requested, acquire low level data utilizing 45 then automatically produced in the form of presentation 

commands to the information construction module of reports that are automatically updated, means for producing 

the operating system to acquire said low-level data and by said second agent the pages relating to the utilization of 

communicate with the semantic knowledge module of the machines monitored, means for collecting the raw data 

the operating system via an application interface, and relative to the utilization of the machines and calculating the 

the generic tool module being configured for aiding in the 50 usable data to supply the instantaneous data to be calculated, 

construction of the pages in high-level hypertext lan- means for calculating the passing of thresholds and produc- 

guage. ing the history of the data calculated, said second agent 

12. A system according to claim 11, further comprising being constituted by three main parts, a data collector which 
means for carrying out systematic processing of the param- collects the data in patterns to determine objects to be 
eter information collected for construction of the dynamic 55 monitored so as to collect the basic information necessary, 
pages by the server in successive steps, each step corre- a data manager to rearrange the data collected from the 
sponding to the processing of said information by a partial- various machines observed so as to facilitate access relative 
lar module, including a page construction module, a seman- to predefined criteria, and a data reporter to generate stan- 
tic knowledge module, an information construction module dard presentation reports from files in which all the raw data 
and a generic tool module, means for receiving a client's 60 collected are stored. 

request in the page construction module from the network 

and preparing the pages by collecting the necessary data in * * * * * 
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